Key idea: the evolution of units in the control group allow you to impute what the evolution of units in the treatment group would have been had they not been treated
1.1.1 Logic
We have group \(A\) that enters treatment at some point and group \(B\) that never does
The estimate:
\[\hat\tau = (\mathbb{E}[Y^A | post] - \mathbb{E}[Y^A | pre]) -(\mathbb{E}[Y^B | post] - \mathbb{E}[Y^B | pre])\] (how different is the change in \(A\) compared to the change in \(B\)?)
\[\hat\tau_{ATT} = \tau_{ATT} + \text{Difference in trends}\]
1.1.3 Simplest case
n_units <-2design <-declare_model(unit =add_level(N = n_units, I =1:N),time =add_level(N =6, T =1:N, nest =FALSE),obs =cross_levels(by =join_using(unit, time))) +declare_model(potential_outcomes(Y ~ I + T^.5+ Z*T)) +declare_assignment(Z =1*(I>(n_units/2))*(T>3)) +declare_measurement(Y =reveal_outcomes(Y~Z)) +declare_inquiry(ATE =mean(Y_Z_1 - Y_Z_0),ATT =mean(Y_Z_1[Z==1] - Y_Z_0[Z==1])) +declare_estimator(Y ~ Z, label ="naive") +declare_estimator(Y ~ Z + I, label ="FE1") +declare_estimator(Y ~ Z +as.factor(T), label ="FE2") +declare_estimator(Y ~ Z + I +as.factor(T), label ="FE3")
1.1.4 Diagnosis
Here only the two way fixed effects is unbiased and only for the ATT.
The ATT here is averaging over effects for treated units (later periods only). We know nothing about the size of effects in earlier periods when all units are in control!
design |>diagnose_design()
Inquiry
Estimator
Bias
ATE
FE1
2.25
(0.00)
ATE
FE2
6.50
(0.00)
ATE
FE3
1.50
(0.00)
ATE
naive
5.40
(0.00)
ATT
FE1
0.75
(0.00)
ATT
FE2
5.00
(0.00)
ATT
FE3
0.00
(0.00)
ATT
naive
3.90
(0.00)
1.1.5 The classic graph
design |>draw_data() |>ggplot(aes(T, Y, color = unit)) +geom_line() +geom_point(aes(T, Y_Z_0)) +theme_bw()
design |>redesign(n_units =10) |>draw_data() |>ggplot(aes(T, Y, color = unit)) +geom_line() +geom_point(aes(T, Y_Z_0)) +theme_bw()
1.1.8 In practice
Need to defend parallel trends
Most typically using an event study
Sometimes: report balance between treatment and control groups in covariates
Placebo leads and lags
1.1.9 Heterogeneity
Things get much more complicated when there is (a) heterogeneous timing in treatment take up and (b) heterogeneous effects
It’s only recently been appreciated how tricky things can get
But we already have an intuition from our analysis of trials with heterogeneous assignment and heterogeneous effects:
in such cases fixed effects analysis weights stratum level treatment effects by the variance in assignment to treatment
something similar here
1.1.10 Staggared assignments
Just two units assigned at different times:
trend =0design <-declare_model(unit =add_level(N =2, ui =rnorm(N), I =1:N),time =add_level(N =6, ut =rnorm(N), T =1:N, nest =FALSE),obs =cross_levels(by =join_using(unit, time))) +declare_model(potential_outcomes(Y ~ trend*T + (1+Z)*(I ==2))) +declare_assignment(Z =1*((I ==1) * (T>3) + (I ==2) * (T>5))) +declare_measurement(Y =reveal_outcomes(Y~Z), I_c = I -mean(I)) +declare_inquiry(mean(Y_Z_1 - Y_Z_0)) +declare_estimator(Y ~ Z, label ="1. naive") +declare_estimator(Y ~ Z + I, label ="2. FE1") +declare_estimator(Y ~ Z +as.factor(T), label ="3. FE2") +declare_estimator(Y ~ Z + I +as.factor(T), label ="4. FE3") +declare_estimator(Y ~ Z*I_c +as.factor(T), label ="5. Sat")
1.1.11 Staggared assignments diagnosis
Estimator
Mean Estimand
Mean Estimate
1. naive
0.50
-0.12
(0.00)
(0.00)
2. FE1
0.50
0.36
(0.00)
(0.00)
3. FE2
0.50
-1.00
(0.00)
(0.00)
4. FE3
0.50
0.25
(0.00)
(0.00)
5. Sat
0.50
0.50
(0.00)
(0.00)
1.1.12 Where do these numbers come?
The estimand is .5 – this comes from weighting the effect for unit 1 (0) and the effect for unit 2 (1) equally
The naive estimate is wildly off because it does not take into account that units with different treatment shares have different average levels in outcomes
1.1.13 Where do these numbers come?
The estimate when we control for unit is 0.36: this comes from weighting the unit-stratum level effects according to the variance of assignment to each stratum:
The estimate when we control for time is -1: this comes from weighting the time-stratum level effects according to the variance of assignment to each stratum
it weights periods 4 and 5 only and equally, yielding the difference in outcomes for unit 1 in treatment (0) and group 2 in control (1)
You are interested in the effects of influx of refugees on right wing voting
You have (say more conservative) states with no refugees at any period
You have (say more liberal) states with refugees post 2016 only
You want to do differences in differences comparing these states before and after
However you worry that things change differntially in the conservative and liberal states: no parallel trends
but you can identify areas within states that are more or less likely to be exposed and compare differnces in differences in the exposed and unexposed groups.
1.1.26 Triple differencing
So:
Two types of states: \(L \in \{0,1\}\), only \(L=1\) types get refugee influx
Two time periods: \(Post \in \{0,1\}\), refugee influx occurs in period \(Post = 1\)
Two groups: \(B \in \{0,1\}\), \(B=1\) types affected by refugee influx
\[Y = \beta_0 + \beta_1 L + \beta_2 B + \beta_3 Post + \beta_4 LB + \beta_5 L Post + \beta_6 B Post + \beta_7L B Post + \epsilon\]
1.1.27 Triple differencing
\[\frac{\partial ^3Y}{\partial L \partial B \partial Post} = \beta_7\]
1.1.28 Can we not just condition on the \(B=1\) types?
The level among the \(B=1\) types is:
\[Y = \beta_0 + \beta_1 L + \beta_2 + \beta_3 Post + \beta_4 L + \beta_5 L Post + \beta_6 Post + \beta_7L Post + \epsilon\] If you did simple before / after differences among the \(B\) types you would get
\[\Delta Y| L = 1, B = 1 = \beta_3 + \beta_5 + \beta_6 + \beta_7\]\[\Delta Y| L = 0, B = 1 = \beta_3 + \beta_6\]
1.1.29 Can we not just condition on the \(B=1\) types?
And so if you differenced again you would get:
\[\Delta^2 Y| B = 1 = \beta_5 + \beta_7\] So the problem is that you have \(\beta^5\) in here which corresponds exactly to how \(L\) states change over time.
1.1.30 Triple
But we can figure out \(\beta_5\) by doing a diff in diff among the \(B\)’s.
\[Y|B = 0 = \beta_0 + \beta_1 L + \beta_3 Post + \beta_5 L Post\]
\[\Delta^2 Y| B = 0 = \beta_5\]
1.1.31 Easier to swallow?
The identifying assumption is that absent treatment the differences in trends between \(L=0\) and \(L=1\) would be the same for units with \(B=0\) and \(B=1\).
In a sense this is one parallel trends assumption, not two
Puzzle: Is it possible to have an effect identified by a difference in difference but incorrectly by a triple difference design?
1.2 Survey experiments
Survey experiments are used to measure things: nothing (except answers) should be changed!
If the experiment in the survey is changing things then it is a field experiment in a survey, not a survey experiment
1.2.1 The list experiment: Motivation
Multiple survey experimental designs have been generated to make it easier for subjects to answer sensitive questions
The key idea is to use inference rather than measurement.
Subjects are placed in different conditions and the conditions affect the answers that are given in such a way that you can infer some underlying quantity of interest
1.2.2 The list experiment: Motivation
This is an obvious DAG but the main point is to be clear that the Value is the quantity of interest and the value is not affected by the treatment, Z.
1.2.3 The list experiment: Motivation
The list experiment supposes that:
Subjects do not want to give a direct answer to a question
They nevertheless are willing to truthfully answer an indirect question
In other words: sensitivities notwithstanding, they are happy for the researcher to make correct inferences about them or their group
1.2.4 The list experiment: Strategy
Respondents are given a short list and a long list.
The long list differs from the short list in having one extra item—the sensitive item
We ask how many items in each list does a respondent agree with:
\(Y_i(0)\) is the number of elements on a short list that a respondent agrees with
\(Y_i(1)\) is the number of elements on a long list that a respondent agrees with
\(Y_i(1) - Y_i(0)\) is an indicator for whether an individual agrees with the sensitive item
\(\mathbb{E}[Y_i(1) - Y_i(0)]\) is the share of people agreeing with sensitive item
1.2.5 The list experiment: Simplified example
How many of these do you agree with:
Short list
Long list
“Effect”
“2+2 = 4”
“2+2 = 4”
“2$\(3 = 6" | "2\)$3 = 6”
“3+6 = 8”
“Climate change is real”
“3+6 = 8”
Answer
Y(0) = 2
Y(1) = 4
Y(1) - Y(0) = 0
[Note: this is obviously not a good list. Why not?]
These trade-off against each other: the more accuracy you have the less protection you have
1.2.14 Individual or group effects?
This is typically used to estimate average levels
However you can use it in the obvious way to get average levels for groups: this is equivalent to calculating group level heterogeneous effects
Extending the idea you can even get individual level estimates: for instance you might use causal forests
You can also use this to estimate the effect of an experimental treatment on an item that’s measured using a list, without requiring individual level estimates:
Note that here we looked at “hiders” – people not answering the direct question truthfully
See Li (2019) on bounds when the “no liars” assumption is threatened — this is about whether people respond truthfully to the list experimental question
1.3 Regression discontintuity
Errors and diagnostics
1.3.1 Design
library(rdss) # for helper functionslibrary(rdrobust)cutoff <-0.5bandwidth <-0.5control <-function(X) {as.vector(poly(X, 4, raw =TRUE) %*%c(.7, -.8, .5, 1))}treatment <-function(X) {as.vector(poly(X, 4, raw =TRUE) %*%c(0, -1.5, .5, .8)) + .25}rdd_design <-declare_model(N =1000,U =rnorm(N, 0, 0.1),X =runif(N, 0, 1) + U - cutoff,D =1* (X >0),Y_D_0 =control(X) + U,Y_D_1 =treatment(X) + U ) +declare_inquiry(LATE =treatment(0) -control(0)) +declare_measurement(Y =reveal_outcomes(Y ~ D)) +declare_sampling(S = X >-bandwidth & X < bandwidth) +declare_estimator(Y ~ D*X, term ="D", label ="lm") +declare_estimator( Y, X, term ="Bias-Corrected",.method = rdrobust_helper,label ="optimal" )
1.3.2 RDD Data plotted
Note rdrobust implements:
local polynomial Regression Discontinuity (RD) point estimators
robust bias-corrected confidence intervals
See Calonico, Cattaneo and Titiunik (2014) and related papers ? rdrobust::rdrobust
1.3.3 RDD Data plotted
rdd_design |>draw_data() |>ggplot(aes(X, Y, color =factor(D))) +geom_point(alpha = .3) +theme_bw() +geom_smooth(aes(X, Y_D_0)) +geom_smooth(aes(X, Y_D_1)) +theme(legend.position ="none")
1.3.4 RDD diagnosis
rdd_design |>diagnose_design()
Estimator
Mean Estimate
Bias
SD Estimate
Coverage
lm
0.23
-0.02
0.01
0.64
(0.00)
(0.00)
(0.00)
(0.02)
optimal
0.25
0.00
0.03
0.89
(0.00)
(0.00)
(0.00)
(0.01)
1.3.5 Bandwidth tradeoff
rdd_design |>redesign(bandwidth =seq(from =0.05, to =0.5, by =0.05)) |>diagnose_designs()
As we increase the bandwidth, the lm bias gets worse, but slowly, while the error falls.
The best bandwidth is relatively wide.
This is more true for the optimal estimator.
1.4 Noncompliance and the LATE estimand
1.4.1 LATE—Local Average Treatment Effects
Sometimes you give a medicine but only a nonrandom sample of people actually try to use it. Can you still estimate the medicine’s effect?
X=0
X=1
Z=0
\(\overline{y}_{00}\) (\(n_{00}\))
\(\overline{y}_{01}\) (\(n_{01}\))
Z=1
\(\overline{y}_{10}\) (\(n_{10}\))
\(\overline{y}_{11}\) (\(n_{11}\))
Say that people are one of 3 types:
\(n_a\) “always takers” have \(X=1\) no matter what and have average outcome \(\overline{y}_a\)
\(n_n\) “never takers” have \(X=0\) no matter what with outcome \(\overline{y}_n\)
\(n_c\) “compliers have” \(X=Z\) and average outcomes \(\overline{y}^1_c\) if treated and \(\overline{y}^0_c\) if not.
1.4.2 LATE—Local Average Treatment Effects
Sometimes you give a medicine but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?
Average in \(Z=0\) group: \(\frac{{n_c} \overline{y}^0_{c}+ \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a\right)}{n_a+n_c+n_n}\)
Average in \(Z=1\) group: \(\frac{{n_c} \overline{y}^1_{c} + \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a \right)}{n_a+n_c+n_n}\)
Difference: \(ITT = ({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n} \rightarrow LATE = ITT\times\frac{n}{n_c}\)
1.4.4 The good and the bad of LATE
You get a well-defined estimate even when there is non-random take-up
May sometimes be used to assess mediation or knock-on effects
But:
You need assumptions (monotonicity and the exclusion restriction – where were these used above?)
Your estimate is only for a subpopulation
The subpopulation is not chosen by you and is unknown
Different encouragements may yield different estimates since they may encourage different subgroups
1.4.5 Pearl and Chickering again
With and without an imposition of monotonicity
data("lipids_data")models <-list(unrestricted =make_model("Z -> X -> Y; X <-> Y"),restricted =make_model("Z -> X -> Y; X <-> Y") |>set_restrictions("X[Z=1] < X[Z=0]")) |>lapply(update_model, data = lipids_data, refresh =0) models |>query_model(query =list(CATE ="Y[X=1] - Y[X=0]", Nonmonotonic ="X[Z=1] < X[Z=0]"),given =list("X[Z=1] > X[Z=0]", TRUE),using ="posteriors")
1.4.6 Pearl and Chickering again
With and without an imposition of monotonicity:
model
query
mean
sd
unrestricted
CATE
0.70
0.05
restricted
CATE
0.71
0.05
unrestricted
Nonmonotonic
0.01
0.01
restricted
Nonmonotonic
0.00
0.00
In one case we assume monotonicty, in the other we update on it (easy in this case because of the empirically verifiable nature of one sided non compliance)
1.5 Spillovers
1.5.1 SUTVA violations (Spillovers)
Spillovers can result in the estimation of weaker effects when effects are actually stronger.
The key problem is that \(Y(1)\) and \(Y(0)\) are not sufficient to describe potential outcomes
1.5.2 SUTVA violations
More completely specified potential outcomes (and estimands)
0
1
2
3
4
Unit
Location
\(D_\emptyset\)
\(y(D_\emptyset)\)
\(D_1\)
\(y(D_1)\)
\(D_2\)
\(y(D_2)\)
\(D_3\)
\(y(D_3)\)
\(D_4\)
\(y(D_4)\)
A
1
0
0
1
3
0
1
0
0
0
0
B
2
0
0
0
3
1
3
0
3
0
0
C
3
0
0
0
0
0
3
1
3
0
3
D
4
0
0
0
0
0
0
0
1
1
3
\(\bar{y}_\text{treated}\)
-
3
3
3
\(\bar{y}_\text{untreated}\)
0
1
4/3
4/3
\(\bar{y}_\text{neighbors}\)
-
3
2
2
\(\bar{y}_\text{pure control}\)
0
0
0
0
ATT (direct effect)
-
3
3
3
ATT (indirect effect)
-
3
2
2
Table: Potential outcomes for four units for different treatment profiles,\(D_1\)-\(D_4\). \(D_i\) represents an allocation to treatment and \(y_j(D_i)\) is the potential outcome for (row) unit \(j\) given (column) allocation \(i\).
1.5.3 SUTVA violations
Unit
Location
\(D_\emptyset\)
\(y(D_\emptyset)\)
\(D_1\)
\(y(D_1)\)
\(D_2\)
\(y(D_2)\)
\(D_3\)
\(y(D_3)\)
\(D_4\)
\(y(D_4)\)
A
1
0
0
1
3
0
1
0
0
0
0
B
2
0
0
0
3
1
3
0
3
0
0
C
3
0
0
0
0
0
3
1
3
0
3
D
4
0
0
0
0
0
0
0
1
1
3
The key is to think through the structure of spillovers.
Here immediate neighbors are exposed
In this case we can define a direct treatment (being exposed) and an indirect treatment (having a neighbor exposed) and we can work out thepropensityfor each unit of receiving each type of treatment
These may be non uniform (here central types are more likely to have teated neighbors); but we can still use the randomization to assess effects
Idea: You can use the design to get a handle on spillovers
1.5.4 Design
dgp <-function(i, Z, G) Z[i]/3+sum(Z[G == G[i]])^2/5+rnorm(1)spillover_design <-declare_model(G =add_level(N =80), j =add_level(N =3, zeros =0, ones =1)) +declare_inquiry(direct =mean(sapply(1:240, # just i treated v no one treated function(i) { Z_i <- (1:240) == idgp(i, Z_i, G) -dgp(i, zeros, G)}))) +declare_inquiry(indirect =mean(sapply(1:240, function(i) { Z_i <- (1:240) == i # all but i treated v no one treated dgp(i, ones - Z_i, G) -dgp(i, zeros, G)}))) +declare_assignment(Z =complete_ra(N)) +declare_measurement(neighbors_treated =sapply(1:N, function(i) sum(Z[-i][G[-i] == G[i]])),one_neighbor =as.numeric(neighbors_treated ==1),two_neighbors =as.numeric(neighbors_treated ==2),Y =sapply(1:N, function(i) dgp(i, Z, G)) ) +declare_estimator(Y ~ Z, inquiry ="direct", model = lm_robust, label ="naive") +declare_estimator(Y ~ Z * one_neighbor + Z * two_neighbors,term =c("Z", "two_neighbors"),inquiry =c("direct", "indirect"), label ="saturated", model = lm_robust)
1.5.5 Spillovers: direct and indirect treatments
1.5.6 Spillovers: Simulated estimates
1.5.7 Spillovers: Opportunities and Warnings
You can in principle:
debias estimates
learn about interesting processes
optimize design parameters
But to estimate effects you still need some SUTVA like assumption.
1.5.8 Spillovers: Opportunities and Warnings
In this example if one compared the outcome between treated units and all control units that are at least \(n\) positions away from a treated unit you will get the wrong answer unless \(n \geq 7\).
1.6 Mediation
1.6.1 The problem of unidentified mediators
Consider a causal system like the below.
The effect of X on M1 and M2 can be measured in the usual way.
But unfortunately, if there are multiple mediators, the effect of M1 (or M2) on Y is not identified.
The ‘exclusion restriction’ is obviously violated when there are multiple mediators (unless you can account for them all).
1.6.2 The problem of unidentified mediators
1.6.3 The problem of unidentified mediators
An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.
But both of these average effects may be positive (for example) even if there is no effect of X on Y through M1.
1.6.4 The problem of unidentified mediators
An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.
Similarly both of these average effects may be zero even if X affects on Y through M1 for every unit!
1.6.5 The problem of unidentified mediators
Another somewhat obvious approach is to see how the effect of \(X\) on \(Y\) in a regression is reduced when you control for \(M\).
If the effect of \(X\) on \(Y\) passes through \(M\) then surely there should be no effect of \(X\) on \(Y\) after you control for \(M\).
This common strategy associated with Baron and Kenny (1986) is also not guaranteed to produce reliable results. See for instance Green, Ha, and Bullock (2010)
1.6.6 Baron Kenny issues
df <-fabricate(N =1000,U =rbinom(N, 1, .5),X =rbinom(N, 1, .5),M =ifelse(U==1, X, 1-X),Y =ifelse(U==1, M, 1-M)) list(lm(Y ~ X, data = df), lm(Y ~ X + M, data = df)) |> texreg::htmlreg()
Statistical models
Model 1
Model 2
(Intercept)
-0.00***
-0.00***
(0.00)
(0.00)
X
1.00***
1.00***
(0.00)
(0.00)
M
-0.00
(0.00)
R2
1.00
1.00
Adj. R2
1.00
1.00
Num. obs.
1000
1000
***p < 0.001; **p < 0.01; *p < 0.05
1.6.7 The problem of unidentified mediators
See Imai on better ways to think about this problem and designs to address it.
1.6.8 The problem of unidentified mediators: Quantities
In the potential outcomes framework we can describe a mediation effect as (see Imai et al): \[\delta_i(t) = Y_i(t, M_i(1)) - Y_i(t, M_i(0)) \textbf{ for } t = 0,1\]
The direct effect is: \[\psi_i(t) = Y_i(1, M_i(t)) - Y_i(0, M_i(t)) \textbf{ for } t = 0,1\]
This is a decomposition, since: \[Y_i(1, M_i(1)) - Y_1(0, M_i(0)) = \frac{1}{2}(\delta_i(1) + \delta_i(0) + \psi_i(1) + \psi_i(0)) \]
If (and a big if), there are no interaction effects—ie \(\delta_i(1) = \delta_i(0), \psi_i(1) = \psi_i(0)\), then \[Y_i(1, M_i(1)) - Y_1(0, M_i(0)) = \delta_i + \psi_i\]
The bad news is that although a single experiment might identify the total effect, it can not identify these elements of the direct effect.
1.6.9 The problem of unidentified mediators: Solutions?
Check formal requirement for identification under single experiment design (“sequential ignorability”—that, conditional on actual treatment, it is as if the value of the mediation variable is randomly assigned relative to potential outcomes). But this is strong (and in fact unverifiable) and if it does not hold, bounds on effects always include zero (Imai et al)
You can use interactions with covariates if you are willing to make assumptions on no heterogeneity of direct treatment effects over covariates. eg you think that money makes people get to work faster because they can buy better cars; you look at the marginal effect of more money on time to work for people with and without cars and find it higher for the latter. This might imply mediation through transport but only if there is no direct effect heterogeneity (eg people with cars are less motivated by money).
1.6.10 The problem of unidentified mediators: Solutions?
Weaker assumptions justify parallel design
Group A: \(T\) is randomly assigned, \(M\) left free.
Group B: divided into four groups \(T\times M\) (requires two more assumptions (1) that the manipulation of the mediator only affects outcomes through the mediator (2) no interaction, for each unit, \(Y(1,m)-Y(0,m) = Y(1,m')-Y(0,m')\).)
Takeaway: Understanding mechanisms is harder than you think. Figure out what assumptions fly.
1.6.11 In CausalQueries
Lets imagine that sequential ignorability does not hold. What are our posteriors on mediation quantities when in fact all effects are mediated, effects are strong, and we have lots of data?
model <-make_model("X -> M ->Y <- X; M <-> Y")plot(model)
1.6.12 In CausalQueries
We imagine a true model and consider estimands:
truth <-make_model("X -> M ->Y") |>set_parameters(c(.5, .5, .1, 0, .8, .1, .1, 0, .8, .1))queries <-list(indirect ="Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]]",direct ="Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]]" )truth |>query_model(queries) |>kable()
query
given
using
case_level
mean
sd
cred.low
cred.high
indirect
-
parameters
FALSE
0.64
NA
0.64
0.64
direct
-
parameters
FALSE
0.00
NA
0.00
0.00
1.6.13 In CausalQueries
model |>update_model(data = truth |>make_data(n =1000)) |>query_distribution(queries = queries, using ="posteriors")
If we investigate the causal types we can see that the data is consistent with direct effects only: specifically that whenever \(M\) is responsive to \(X\), \(Y\) is responsive to \(X\).
2 References
Baron, Reuben M, and David A Kenny. 1986. “The Moderator–Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations.”Journal of Personality and Social Psychology 51 (6): 1173.
Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.”Journal of Econometrics 225 (2): 254–77.
Green, Donald P, Shang E Ha, and John G Bullock. 2010. “Enough Already about ‘Black Box’ Experiments: Studying Mediation Is More Difficult Than Most Scholars Suppose.”The Annals of the American Academy of Political and Social Science 628 (1): 200–208.
Li, Yimeng. 2019. “Relaxing the No Liars Assumption in List Experiment Analyses.”Political Analysis 27 (4): 540–55.
Olden, Andreas, and Jarle Møen. 2022. “The Triple Difference Estimator.”The Econometrics Journal 25 (3): 531–53.